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Abstract. We present a novel solution technique for the blind subspace 
deconvolution (BSSD) problem, where temporal convolution of multidi- 
mensional hidden independent components is observed and the task is 
to uncover the hidden components using the observation only. We carry 
out this task for the undercomplete case (uBSSD): we reduce the orig- 
inal uBSSD task via linear prediction to independent subspace analysis 
(ISA), which we can solve. As it has been shown recently, applying tem- 
poral concatenation can also reduce uBSSD to ISA, but the associated 
ISA problem can easily become 'high dimensional' [6]. The new reduction 
method circumvents this dimensionality problem. We perform detailed 
studies on the efficiency of the proposed technique by means of numerical 
simulations. We have found several advantages: our method can achieve 
high quality estimations for smaller number of samples and it can cope 
with deeper temporal convolutions. 



1 Introduction 

There is a growing interest in independent component analysis (ICA) and 
blind source deconvolution (BSD) for signal processing and hidden component 
searches. ICA has been used for many purposes, including (i) feature extraction, 
(ii) denoising, (iii) processing of financial and neurobiological data, e.g. fMRI, 
EEG, and MEG. BSD has also shown potentials in several areas, for example 
(i) in remote sensing applications: passive radar/sonar processing, (ii) in image- 
deblurring and image restoration, (iii) in acoustics, including speech enhance- 
ment using microphone arrays, (iv) in multi- antenna wireless communications 
and in sensor networks, (v) in biomedical signal — EEG, EGG, MEG, fMRI — 
analysis, (vi) in optics, and (vii) in seismic exploration. For recent reviews in 
ICA and BSD themes see, e.g., [1,2] and [3], respectively. 

Traditionally, ICA is one-dimensional in the sense that all sources are as- 
sumed to be independent real valued stochastic variables. The traditional ex- 
ample of ICA is the so-called cocktail-party problem, where there are D sound 
sources and D microphones and the task is to separate the original sources from 
the observed mixed signals. Clearly, applications where not all, but only certain 
groups of the sources are independent may have high relevance in practice. In this 



case, independent sources can be multidimensional. For example, there could be 
independent groups of people talking about independent topics at a conference, 
or independent rock hands may be playing at a party. This is the independent 
subspace analysis (ISA) extension of ICA [4]. Strenuous efforts have been made 
to develop ISA algorithms, where the theoretical problems concern mostly (i) 
the estimation of the entropy or of the mutual information, or (ii) joint block 
diagonalization. A recent list of possible ISA solution techniques can be found 
in [6]. 

Another extension of the original ICA task is the BSD problem [3], where the 
observation is a temporal mixture of the hidden components. Such a problem 
emerges, e.g., if the cocktail-party is held in an echoic room. A novel task, the 
blind subspace deconvolution (BSSD) [6] arises if we combine the ISA and the 
BSD assumptions. One can think of this task as the separation problem of the 
pieces played simultaneously by independent rock bands in an echoic stadium. 
One of the most stringent applications of BSSD could be the analysis of EEG or 
fMRI signals. The ICA assumptions could be highly problematic here, because 
some sources may depend on each other, so an ISA model seems better. Fur- 
thermore, the passing of information from one area to another and the related 
delayed and transformed activities may be modeled as echoes. Thus, one can ar- 
gue that BSSD may fit this important problem domain better than ICA or even 
ISA. It has been shown in [6] that the undercomplete BSSD task (uBSSD) — 
where in terms of the cocktail-party problem there are more microphones than 
acoustic sources — can be reduced to ISA by means of temporal concatenation.^ 
However, the reduction technique may lead to 'high dimensions' in the associ- 
ated ISA problem. Here, an alternative reduction method solution is introduced 
for uBSSD and this solution avoids the increase of ISA dimensions. Namely, we 
show that one can apply the linear prediction method to reduce the uBSSD task 
to ISA such that the dimension of the associated ISA problem equals to the di- 
mension of the original hidden sources. As an additional advantage, we shall see 
that this reduction principle is more efficient on problems with deeper temporal 
convolutions. 

The paper is built as follows: Section 2 formulates the problem domain. 
Section 3 shows how to reduce the uBSSD task to an ISA problem. Section 4 
contains the numerical illustrations. Section 5 contains a short summary. 

2 The BSSD Model 

We define the BSSD task in Section 2.1. Earlier BSSD reduction principles are 
reviewed in Section 2.2. 

2.1 The BSSD Equations 

Here, we define the BSSD task. Assume that we have M hidden, independent, 
multidimensional components (random variables). Suppose also that only their 

^ The complete, and in particular the overcomplete BSSD task is challenging and no 
general solution is known yet. 



casual FIR filtered mixture is available for observation: 



L 



x(i) 



(1) 



where s(t) = [s^(t); . . . ; s^(t)] G is a vector concatenated of components 
s^(t) G M^. Here, for the sake of notational simplicity we used identical di- 
mension for each component. For a given m, s^(t) is i.i.d. (independent and 
identically distributed) in time t, there is at most one Gaussian in s^s, and 
/(s^, . . . , s^) = 0, where / stands for the mutual information of the arguments. 
The total dimension of the components is Dg := Md, the dimension of the ob- 
servation X is I^a^. Matrices G R^^^^^ = 0, . . . , L) describe the convolutive 
mixing. Without any loss of generality it may be assumed that ^[s] = 0, where 
E denotes the expectation value. Then ^[x] =0 holds, as well. The goal of the 
BSSD problem is to estimate the original source s(t) by using observations x(t) 
only. The case L = corresponds to the ISA task, and if d = 1 also holds then 
the ICA task is recovered. In the BSD task d = 1 and L is a non-negative integer. 
Dx > Ds is the undercomplete, Dx = Dg is the complete^ and Dx < Ds is the 
overcomplete task. Here, we treat the undercomplete BSSD (uBSSD) problem. 

For consecutive reductional steps we rewrite the BSSD model using operators. 
Let H[z] := Y^jLotiiz~^ G R[z]^">^^^ denote the Dx x Dg polynomial matrix 
corresponding to the convolutive mixing, in a one-to-one manner. Here, z is the 
time-shift operation, that is {z~^u){t) := u{t — 1). Now, the BSSD equation (1) 
can be written as 



In the uBSSD task it is assumed that ii[z] has a polynomial matrix left inverse. 
In other words, there exists polynomial matrix W[z] G M[2:]^^^^=^ such that 
W[z]H[z] is the identity mapping. It can be shown [7] that for Dx > Dg such a 
left inverse exists with probability 1, under mild conditions. The mild condition 
is as follows: Coefficients of polynomial matrix H[2:], that is, the random matrix 
[Ho;...;Hl] is drawn from a continuous distribution. For the ISA task it is 
supposed that mixing matrix Hq G R^^x^^ j^^g column rank, i.e., its rank 
is Dg. 

2.2 Existing Decomposition Principles in the BSSD Problem Family 

There are numerous reduction methods for the BSSD problem in the literature. 
For example, its special case, the undercomplete BSD task can be reduced (i) to 
ISA by temporal concatenation of the observations [8], or (ii) to ICA by means of 
either spatio-temporal decorrelation [9], or by linear prediction (autoregressive 
(AR) estimation) [10-12]. As it was shown in [6], the uBSSD task can also be 
reduced to ISA by temporal concatenation. In Section 3, we show another route 
and describe how linear prediction can help to transcribe the uBSSD task to ISA. 
According to the ISA Separation Theorem [6,13], under certain conditions, the 
solution of the ISA task requires an ICA preprocessing step followed by a suitable 



X = H[z]s. 



(2) 



permutation of the ICA elements. This principle was conjectured in [4] on basis 
of numerical simulations. Only sufficient conditions are available in [6,13] for the 
ISA Separation Theorem. Possible reduction steps are shown in Fig. 1. 




Fig. 1: Extensions of the ICA task. Prefix u denotes the undercomplete case. Dotted 
arrows point to special cases. Solid arrows indicate possible reductions. Respective 
reduction principles are noted at the arrows. 



3 Reduction of uBSSD to ISA by Linear Prediction 

Below, we reduce the uBSSD task to ISA by means of linear prediction. The 
procedure is similar to that of [12], where it was applied for undercomplete BSD 
(i.e., for d = 1). 

Theorem, In the uBSSD task, observation process x(t) is autoregressive and its 
innovation x(t) := x(t) — ^[x(t)|x(t — l),x(t — 2), . . .] is Hos(t)^ where E['\'] 
denotes the conditional expectation value. Consequently, there is a polynomial 
matrix War[^] G R[z]^^^^^ such that War[^]x = Hqs. 

Proof. We assumed that H[z] has left inverse, thus the hidden s can be ex- 
pressed from observation x by causal FIR ffitering, i.e., s = H~^[2;]x, where 
H"M^] = J2n=o^nZ-'' e Rlz]^^""^- and TV denotes the degree of the B.-^[z] 
polynomial. Thus, terms in observation x that differ from Hos(t) in (1) belong to 
the linear hull of the finite history of x: x(t) = Hos(t) + X^/^i H/(H~^[2;]x)(t — 
I) G Hos(t) + (x(t — l),x(t — 2), . . . ,x(t — L + TV)). Because s{t) is independent 
of (x(t — 1), x(t — 2), ... , x(t — L + iV)), we have that observation process x(t) 
is autoregressive with innovation Hos(t). 

Thus, AR fit of x(t) can be used for the estimation of Hos(t). This innovation 
corresponds to the observation of an undercomplete ISA model^, which can be 
reduced to a complete ISA using principal component analysis (PC A). Finally, 
the solution can be finished by any ISA procedure. The pseudocode of the above 
linear predictive approximation (LPA) method for the uBSSD task is given in 
Table 1. 

^ Assumptions made for H[z] in the uBSSD task implies that Ho is of full column 
rank and thus the resulting ISA task is well defined. 



Table 1: Linear predictive approximation (LPA): Pseudocode 



Input of the algorithm 

Observation: {x{t)}t=i,...,T 
Optimization 

AR fit: for observation x estimate War[^] 

Estimate innovation: x = War[^]x 

Reduce uISA to ISA and whiten: x = WpcAX 

Apply ISA for x : separation matrix is Wisa 
Estimation 

Wubssd[^] = WisaWpcaWar[2:] 

s = WuBSSd[^]x 



The reduction procedure implies that hidden components s'^ can be recovered 
only up to the ambiguities of the ISA task. The ISA ambiguities are simple [14]: 
hidden multidimensional components can be determined up to permutation and 
up to invertible transformation within the subspaces. Furthermore, in the ISA 
model it can be assumed without any loss of generality, that both the hidden 
source (s) and the observation are white; their expectation values are zeroes and 
the covariance matrices are identities. Now, the s'^ components are determined 
up to permutation and orthogonal transformation. 

4 Illustrations 

We show the results of our studies concerning the efficiency of the algorithm 
of Table 1. We compare the LPA procedure with the uBSSD method described 
in [6]. There temporal concatenation was applied to transform the uBSSD task 
to a 'high-dimensional' ISA task. We shall refer to that method as the method 
of temporal concatenation, or TCC for short. Test problems are introduced in 
Section 4.1. The performance index that we use to measure the quality of the 
solutions is detailed in Section 4.2. Numerical results are presented in Section 4.3. 

4.1 Databases 

We define four databases (s) to study our LPA algorithm. These are the 
databases used in [6], too. In the 3D-geom test hidden components are ran- 
dom variables uniformly distributed on 3-dimensional geometric forms {d = 3). 
We have 6 components (M = 6). The dimension of the hidden source s is 
Ds = 18. See Fig. 2(a). The celebrities test has 10 of 2-dimensional source 
components generated from cartoons of celebrities {d = 2).^ The 2-dimensional 
images of celebrities are considered as the density functions of s^: sources are 
generated according to the pixel intensities. See Fig. 2(b). In the letters data set, 
hidden sources s'^ are uniformly distributed on 2-dimensional images {d = 2) of 



^ http://www.smileyworld.com 



letters A and B. The number of components and the dimension of the sources 
are minimal (M = 2, = 4). See Fig. 2(c). Our Beatles test is a non-i.i.d. ex- 
ample. Here, hidden sources are stereo Beatles songs. ^ 8 kHz sampled portions 
of two songs (A Hard Day's Night, Can't Buy Me Love) made the hidden s^s 
{d = 2,M = 2,Ds=4). 





A B 



(a) 



(c) 
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Fig. 2: Illustration of the 3D-geom, celebrities and letters databases, (a): Database 
3D-geom contains 6 of 3-dimensional components (M = 6, d = 3). Hidden sources 
are uniformly distributed variables on 3-dimensional geometric objects, (b): Database 
celebrities contains 10 of 2-dimensional components (M = 10, (i = 2). Density functions 
of the hidden sources (s"^) are proportional to the pixel intensities of the 2-dimensional 
images, (c): Letters database is minimal. Hidden sources s"^ are uniformly distributed 
on images of letters A and B (M = 2, d = 2). 



4.2 The Amari-index 

According to Section 3, in the ideal case, the product of matrix WisaWpca (the 
result of PC A and ISA) and matrix Ho, that is matrix G := WisaWpcaHq G 
^Ds><Ds ^ block-permutation matrix made oi d x d blocks. To measure this 
block-permutation property, we used the normalized version [13] of the Amari- 
error [15] adapted to the ISA task [5]. Namely, let matrix G G M^«x^« be 
decomposed into dx d blocks: G = [G*'-^] . ^. Let g^'^ denote the sum of 

the absolute values of the elements of matrix G*'-^ G M^^^. Now, the normalized 
Amari-error, the Amari-index (r(-) = rd,Ds{')) is defined as: 



^ ^ ' 2M(M - 1) 

For matrix G we have that < r(G) < 1. r(G) = if, and only if G is a 
block-permutation matrix with dx d sized blocks. Thus, r(G) = for a perfect 
G, whereas in the worst case r(G) = 1. Given that index r takes values in [0, 1] 



y maxj g'^^^ J y max^ g^^^ J 



^ http:/ /rock. mididb.com/beatles/ 



independently from d and Ds, we can use this measure to compare the TCC and 
LPA techniques. 

4.3 Simulations 

Results on databases 3D-geom, celebrities, letters and Beatles are provided here. 
The experimental studies concern two questions: 

1. The TCC and the LPA methods are compared on uBSSD tasks. 

2. The performance as a function of convolution length is studied for the LPA 
technique. 

Our test databases correspond to those of [6] and here, we study the 
Dx = 2Ds case, like in the cited reference. Both the TCC and the LPA method 
reduce the uBSSD task to ISA problems and we use the Amari-index (Sec- 
tion 4.2) to measure and compare their performances. For all values of the pa- 
rameters (sample number: T, convolution length: L + 1), we have averaged the 
performances upon 50 random initializations of s and il[z]. The coordinates of 
matrices were chosen independently from standard normal distribution. We 
used the Schwarz's Bayesian Criterion [16] to determine the optimal order of 
the AR process. The criterion was constrained: the order Q of the estimated AR 
process (see Table 1) was limited from above, the upper limit was set to twice the 
length of the convolution, i.e., Q < 2(L + 1). The AR process was then estimated 
by the method detailed in [16] and [17]. Both in the case of TCC and in the case 
of LPA, ISA was accomplished by joint f-decorrelation (JFD) as detailed in [18]. 

We studied the dependence of the precision versus the sample number on 
databases 3D-geom and celebrities. The dimension and the number of the com- 
ponents were d = 3 and M = 6 for the 3D-geom database and d = 2 and 
M = 10 for the celebrities database, respectively. In both cases the sample num- 
ber T varied between 1, 000 and 100, 000. The length of the convolution (L + 1) 
changed between 2 and 6. Comparisons with the TCC method are shown in 
Figs. 3(a)-(d). LPA estimation errors are given in Table 2. Figures 4(a)-(d) and 
(i)-(l) illustrate the estimations of the LPA technique on the 3D-geom and on 
the celebrities databases, respectively. 





L = 1 L = 2 


L = 3 


L = 4 


L = 5 


3D-geom 


0.20%(±0.01) 0.20%(±0.02) 


0.19%(±0.02) 


0.20%(±0.02) 


0.20%(=b0.01) 


celebrities 


0.33%(zb0.02) 0.33%(zb0.02) 0.34%(=b0.02) 0.34%(=b0.02) 0.34%(=b0.02) 



Table 2: The Amari-index of the LPA method for database 3D-geom and celebrities, for 
different convolution lengths: average zb deviation. Number of samples: T — 100, 000. 
For other sample numbers between 1, 000 <T< 100, 000, see Figs. 3(a) and (c). 



Figures 3(a) and (c) demonstrate that the LPA algorithm is able to un- 
cover the hidden components with high precisions. The Amari-index r decreases 




uBSSD: letters (LPA) uBSSD: letters (TCC/LPA) 




(g) (h) 

Fig. 3: Estimation error of the LPA method and comparisons with the TCC method for 
the 3D-geom, the celebrities, the letters and the Beatles databases. Scales are 'log log' 
plots. Data correspond to different convolution lengths (L + 1). (a), (c), (e) and (g): 
Amari-index as a function of sample number for the 3D-geom, celebrities, letters and 
Beatles databases, (b), (d), (f) and (h): Quotients of the Amari-indices of the TCC and 
the LPA methods: for quotient value g > 1, the LPA method is q times more precise 
than the TCC method. 



according to power law r(T) oc (c > 0) for sample numbers T > 2000. 
The power law is manifested by straight lines on log log scales. According to 
Figs. 3(b) and (d), the LPA method is superior to the TCC method (i) for all 
sample numbers 1,000 <T< 100,000, moreover (ii) LPA can provide reason- 
able estimates for much smaller sample numbers. This behavior is manifested by 
the initial steady increase of the quotients of the Amari indices of the TCC and 
LPA methods as a function of sample number followed by a sudden drop when 
the sample number enables reasonable TCC estimations, too. The LPA method 
resulted in 1.1 — 88-times increase of precision for the 3D-geom database and a 
similar 1.0 — 87-times increase for the celebrities database. According to Table 2, 
the Amari-index for sample number T = 100, 000 is 0.19 - 0.20% (0.33 - 0.34%) 
with small 0.01 — 0.02 (0.02) standard deviations for the 3D-geom {celebrities) 
database. Figures 4(e)- (h) and (m)-(q) demonstrate that the LPA method may 
provide acceptable estimations for reasonably small (T = 20, 000) sample num- 
bers up to convolution depth L = 20. 

In our test on 'letters^ and 'Beatles^ the number of components and their 
dimensions were minimal (c? = 2, M = 2). According to Figs. 3(e) and (g), the 
LPA method found the hidden components. For the letters dataset, the 'power 
law' decline of the Amari-index, that was apparent in the 3D-geom and the 
celebrities databases, appears too. For this dataset. Fig. 3(f) shows that the 
LPA method is more precise than the TCC method for all sample numbers. The 
quotient is between 1.2 — 110, and the form of the curve is similar to those of 
the SD-geom and celebrities databases. According to Table 3, for sample number 
T = 75,000 the Amari-index stays below 1% on average (0.3 — 0.36%) and has 
0.11 — 0.15 standard deviation. Visual inspection of Fig. 3(g) shows that the 
LPA method found the hidden components for sample number T > 30, 000 on 
the Beatles database. We found that the TCC method gave reliable solutions for 
sample number T = 50,000 or so. In addition, according to Fig. 3(h) the LPA 
method is more precise for T > 30, 000 than the TCC technique. The increase in 
precision becomes more pronounced for larger convolution parameter L. Namely, 
for sample number 75,000 and for L = 1,2,5,10,20,30 the ratios of precision 
are 1.50,2.24,4.33,4.42,9.03,11.13, respectively on the average. According to 
Table 3, for sample number T = 75,000 the Amari-index stays below 1% on 
average (0.4 — 0.71%) and has 0.02 — 0.08 standard deviation for the Beatles 
test. 
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L = 5 


L = 10 


L = 20 


L = 30 


0.32%(±0.11) 


0.36%(±0.14) 


0.34%(±0.13) 


0.34%(±0.15) 


0.34%(±0.11) 


0.30%(±0.14) 


0.71%(±0.06) 


0.64%(±0.07) 


0.53%(±0.02) 


0.75%(±0.07) 


0.45%(±0.08) 


0.40%(±0.06) 



Table 3: The Amari-index of the LPA method for database letters and Beatles for 
different convolution lengths: average zb deviation. Number of samples: T = 75, 000. 
First row: letters, second row: Beatles test. For other sample numbers between 1, 000 < 
T < 75,000, see Figs. 3(e) and (g). 
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Fig. 4: Illustration of the LPA method on the uBSSD task for the 3D-geom and 
celebrities databases, (a)-(d), (i)-(l): sample number T — 100,000, convolution length 
L + 1 = 6. (a) and (i): hidden components s^{t). (b) and (j): observed convolved sig- 
nals x(t), only 1000 time steps are shown, (c) and (k): Hinton-diagram of G, ideally 
block-permutation matrix with 2x2 (3x3) blocks, (d) and (1): estimated components 
(s"^), Amari-indices: 0.2% and 0.34%, respectively, (e)-(h) and (m)-(q): sample number 
T = 20, 000, dependence of estimated components (s"*) on the convolution parameter 
L. L is 1, 5, 10, 20, and 1, 5, 10, 15, 20 respectively. 
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Fig. 5: Illustration of the LPA method on the uBSSD task for the letters database. 
(a)-(d): sample number T = 75, 000, convolution length L + 1 = 31, Amari-index 0.3%. 
(a): hidden components s^(t). (b): observed convolved signals x(t), only 1000 time 
steps are shown, (c): Hinton-diagram of G, ideally block-permutation matrix with 2x2 
blocks, (d): estimated components (s^). (e)-(l): dependence of estimated components 
(s^) on the convolution parameter L. L is 1,5,10,20,50,100,200,230, respectively. 
Sample number is T = 15, 000. 



Both for database letters and database Beatles, the estimations are acceptable 
up to about L = 230 convolution depths for sample number T = 15,000. We 
illustrate this in Figs. 5(e)-(l) for the letters database with average Amari-index 
estimations. 



5 Summary 

We showed a novel solution method for the undercomplete case of the blind 
subspace deconvolution (uBSSD) task. We used a stepwise decomposition prin- 
ciple and reduced the problem with linear prediction to independent subspace 
analysis (ISA) task. We illustrated the method on different tests. Our method 
supersedes the temporal concatenation based uBSSD method, because (i) it gives 
rise to a smaller dimensional ISA task, (ii) it produces similar estimation errors 
at considerably smaller sample numbers, and (iii) it can treat deeper temporal 
convolutions. 
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